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Abstract 


Wheat flour is one of the most important and strategic food resources especially in developing countries. The 
addition of Sodium hydrosulfite to flour for improving some appearance features can have dangerous impacts on 
the consumer health. Therefore, detection of this harmful substance is great practical significance. In the present 
study, the potential of Fourier transform-mid infrared (FT-MIR) spectroscopy in 400-4000 cm! for the fast 
detection of Sodium hydrosulfite powder in wheat flour was investigated. After getting the spectral data from 
samples, firstly some preprocessing methods were used to correct harmful and unwanted effects on spectral data, 
and then Principal Component Analysis (PCA) as unsupervised and Support Vector Machine (SVM) and Artificial 
Neural Network (ANN) models as supervised classification models and Partial Least Square Regression (PLSR) 
as regression model were applied to detect and quantify the adulteration in pure flour samples. The best outcomes 
were the accuracy of 86.66 and 86.70 for SVM and ANN models with S-G + D2 + SNV preprocessing, respectively 


and R*, = 0.99 For PLSR model. 
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Introduction 

Bread as one of the most significant sources 
of daily requirement components for body 
(such as proteins, minerals and vitamins) is one 
of the staple foods for many countries, 
particularly in Iran (Ahamadabadi et al., 2016; 
GhR, Yunesian, Vaezi, Nabizadeh, & GhA, 
2006; Sabeghi, 2004). The consumption of 
bread in Iran is five times more than Europe 
(Malakootian & Dowlatshahi, 2005; Sabeghi, 
2004). Among the main ingredients of bread, 


wheat flour has a special place and has direct 
relation to the quality of bread and also to the 
health of consumers. Therefore, it should get 
the certificate of Iranian national standard. 
Wheat contains 78.10% carbohydrate, 14.70% 
protein, 2.10% fat, 2.10% minerals and 
noticeable proportion of vitamins (Adams, 
Lombi, Zhao, & McGrath, 2002; Shewry, 2009; 
Shewry et al., 2006; Topping, 2007). 
According to the statistics of world Health 
Organization and Food and _ Agriculture 
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Organization of United Nations, 25 types of 
food additives are used in each country 
according to the food safety policy (Martins, 
Sentanin, & De Souza, 2019). The maximum 
acceptable amount of them and also the 
Assurance of avoidance of any unauthorized 
additives should be considered. Sodium 
hydrosulfite also known as Blankit is a white 
crystalline powder containing inorganic sulfur 
compounds (Reza ef al., 2014). In food 
industry, this material is applied for nuts, sugar, 
etc. to avoid browning and bleaching and 
regeneration of cellulose fibers (de Carvalho & 
Schwedt, 2005). Sodium hydrosulfite has been 
utilized in Iranian bread industry to hide visible 
defects of bread by affecting the velocity of 
production process and compensating some 
visible results of lack of natural fermentation 
and poor flour quality (Asgari, 
SeidMohammadi, Faradmal, Moradi, & Yari, 
2018). This material has so dangerous effects 
on human health. Adverse effects of Blankit 
include the elimination and damage of villi in 
the stomach and intestines in the long term, 
therefore, it can cause the development of 
gastrointestinal cancer. It is also known to be an 
effective factor in developing diabetes (Karami, 
Alikord, Mokhtari, Sadighara, & Jahed- 
Khaniki, 2021). Therefore, detection of this 
harmful material in the human’s diet is 
essential. In general, different approaches have 
been applied to quantify sulfur factors in food, 
such as titration (Monnier & Wiliams, 1972), 
liquid and gas chromatography (Rethmeier, 
Rabenstein, Langer, & Fischer, 1997), high 
performance ion chromatography (Lavigne- 
Delcroix, Tusseau, & Proix, 1996), 
electroanalysis methods include the study of the 
electrical activity of sulfites, voltammetry 
(Govaert, Temmerman, & Kiekens, 1999), and 
amperometry, potentiometric and the method of 
general evaluation of sulfites in the automated 
system (Pisoschi et al., 2020). The mentioned 
techniques encompass some drawbacks such as 
being high-cost, laborious, and destructive. 
Therefore, some other nondestructive, 
inexpensive and fast methods are required. 
Fourier Transform infrared (FT-IR) 


spectroscopy is one of the fingerprint 
techniques which is widely used to identify 
components of food and determine possible 
impurities. FT-IR spectroscopy can be adjusted 
in the middle range (450-4000 cm‘!, FT-MIR) 
or near range (4000-10000 cm'!, FT-NIR) 
(Pallone, dos Santos Caramés, & Alamar, 
2018). FT-MIR comes up with more structural 
and chemical information than Fourier 
Transform-Near Infrared (FT-NIR) by the 
ability of displaying vibrational and rotary 
stretching process of covalent bonds (Lohumi, 
Lee, Lee, & Cho, 2015). Some researchers have 
explored the applicability of spectroscopic 
techniques to investigate chemical information 
of materials. Mohamed ef al. explored 
classification of five food powder types (wheat 
flour, organic wheat flour, rice flour, corn 
starch, and tapioca starch) and reported that 
Support Vector Machine (SVM) model had 
acceptable outcomes for classification of 
mentioned powders (Mohamed, Solihin, Astuti, 
Ang, & Zailah, 2019). In another study, 
Girolma et al. applied FT-IR techniques in 
different ranges (FT-MIR and FT-NIR) to 
detect the adulteration of durum wheat pasta 
with common wheat. Linear Discriminant 
Analysis (LDA) and Partial Least Square— 
Discriminant Analysis (PLS-DA) had the 
results of 80 and 95% for three class dataset and 
91 and 97% for two class datasets (De Girolamo 
et al., 2020). However as far as our knowledge, 
the applicability of FIT-MIR_ spectroscopy 
method with combination of ANN for 
classification and PLSR_~ model _ for 
quantification of adulteration of this harmful 
material in Iranian wheat flour has not been 
investigated. In the present study, the 
applicability of FI-MIR _ spectroscopy 
combined with chemometric methods and 
various preprocessing algorithms for detection 
and quantification of sodium hydrosulfite in 
wheat flour in Iran was studied. 


Materials and Methods 

In the present research, after preparing 
samples, spectral data were acquired and 
preprocessed. The both supervised and 


Kazemi et al. Application of FT-IR Spectroscopy with Various Classification and Regression ... 19 


unsupervised models were applied. Afterward, 

the results were analyzed for detection and 

quantification of pure and adulterated samples. 

Fig. 1. represents the flowchart of flour 
, ' | 


classification 


v 


unsupervised 
uae i 


Comparision 
of results 


adulteration detection procedure by FT-MIR 
spectroscopy. 


Fig. 1. The schematic flowchart of the steps of present study 


Sample Preparation 

Sardari wheat seeds (harvested in 2021) 
were purchased from a seed modifying center 
in Bonab, East Azerbaijan, Iran. The seeds were 
then harvested in four distinct places in Iran, 
taking in consideration the geographical 
variation of samples. Sardari wheat was 
selected because it is the highest under-harvest 
wheat variety in Iran. Sodium hydrosulfite 
(with the purity of 90%) was acquired from a 
supermarket in Bonab, Iran. First, wheat seeds 
were milled by a laboratory benchtop mill to get 
the wheat flours. Then the flour was passed 
through a sieve (mesh 420um) to. get 
homogenous flour sample. The considered 
adulterant concentrations (w/w) were 10, 15, 
20, and 25%. Totally, 150 samples were 
prepared (25 for pure flour, 25 for sodium 
hydrosulfite, and 25 samples for 4 adulterant 
groups). After mixing the adulterant to pure 
flour with the mentioned levels, they were 
blended intensely to get the homogenous 
samples as much as possible. Finally, the 
prepared samples were transferred to 
microtubes to transfer to the laboratory. 


Spectra Acquisition 


Spectral data were acquired at the central 
laboratory of Tabriz University with FT-MIR 
spectrometer (TENSOR 27, Bruker, Germany) 
in transmittance mode and with resolution of 1 
cm'!. The scanning speed was 20 kHz and with 
64 scans. Each powdered sample was placed on 
the ATR (single bounce) crystal and pressed 
until the desired signal density acquired. The 
crystal was washed with 100% ethanol after 
testing each sample. For each individual 
sample, 3 transmittance spectra were acquired 
and mean spectrum of three replicates was used 
for further analysis. Finally, the mean spectra 
were transferred to Excel 2019 version to be 
prepared for statistical analysis. Multivariate 
statistical analysis was conducted’ with 
Unscrambler v 10.4 (Camo software As, Oslo, 
Norway, 2011) for PCA and SVM and 
classification Toolbox in Matlab (Mathworks, 
Inc., Natick, Massachusetts, USA) for Artificial 
Neural Networks (ANN). 


Preprocessing 

Before classification or regression 
modeling, pretreatment of spectral data is an 
essential step to remove the unwanted and 
uninformative data. This can be due to large 
amount of water in samples, different 
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conditions of samples, and noise in spectra that 
comes from electronic components in the 
system (Boysworth & Booksh, 2008; Christy & 
Kvalheim, 2007; Varmuza & Filzmoser, 2016). 
The most common applied preprocessing 
techniques in spectroscopy is divided in to two 
categories: spectral normalization and spectral 
derivatives (Rinnan ef al., 2009). Spectral 
normalization techniques which contain 
Standard Normal Variate (SNV), Multiplicative 
Scatter Correction (MSC), and de-trending 
(DT) can be used for correction of scattering 
effects. While spectral derivatives including 
(first and second derivatives and smoothing 
techniques) are applied for correction of peak 
overlap and _ baseline drifts (L6pez- 
Maestresalas et al., 2019). Both SNV and MSC 
are the most commonly used algorithms to 
correct the scatter effects. The difference 
between SNV and MSC methods is based on 
the fact that the scatter correction in SNV 
method is based on the average value of every 
individual spectra, but in MSC technique a 
reference spectra (average spectra) is required 
to contrast the whole spectra with that (Dhanoa, 
Lister, Sanderson, & Barnes, 1994; Zeaiter, 
Roger, & Bellon-Maurel, 2005). Among the 
spectral derivative methods, Savitzky-Golay 
(S-G) is the most common algorithm for 
derivation (Savitzky & Golay, 1964). By this 
method, the data with a window size chosen are 
fitted by a polynomial for which the degree 
must also be chosen (Barak, 1995). In present 
study, S-G (with the window size of ten), SNV, 
MSV, first and second derivatives and 
combination of them were applied. 


Classification 

Spectra contain high volumes _ of 
information, which are very difficult to 
interpret by visual inspection only. 
Chemometrics is a tool for extracting this 
information from the multivariate chemical 
data, using mathematics. Chemometrics is 
generally applied to explore patterns of 
association in data; track properties of materials 
on a continuous basis or to prepare and use 
multivariate classification models. By utilizing 


diverse preprocessing techniques, __ the 
generation of principal models is triggered and 
subsequently produces output data. Both 
unsupervised and supervised techniques for 
classification were utilized in this study. 


Unsupervised Classification 

In the first step of data exploration, 
Principal Component Analysis (PCA) is usually 
applied to recognize any possible separated 
groups. The main objective of PCA model as an 
unsupervised modelling method is decreasing 
the dimensionality of data and preservation of 
the present variation (Jolliffe, 1986). The 
reduction of dimensionality is done by defining 
new variables, principal Components (PCs) that 
consists linear combinations of the original data 
(Kamruzzaman, Barbin, ElMasry, Sun, & 
Allen, 2012). First PC represents the most 
variance of dataset and the next PCs which are 
orthogonal to the preceding ones contain the 
most of the remaining variance (Fodor, 2002). 
Application of data matrix for PCA model in 
this study consists of 1886 columns 
(corresponding to the recorded wavenumbers) 
and 150 rows (corresponding to the number of 
samples). 


Supervised Classification 
SVM Model 

The Support Vector Machine classification 
(SVMC) is a_ supervised classification 
technique that utilizes kernel functions to 
represents the original space in the format of 
feature space. It determines the best separation 
between classes by applying a _ unique 
hyperplane to the dataset (Ballabio & 
Todeschini, 2009; Fletcher, 2009; Vapnik, 
1999). The final classification outcomes of 
SVM are determined by a small number of 
Support Vectors that are the samples lying on 
the margins of the model. For building 
classification model and evaluation of their 
performance, calibration and test datasets were 
used, respectively. 70% of data was appointed 
as training and 30% of data was considered as 
test dataset. SVs lie to the closest boundaries 
between classes. In SVM model various kernel 
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functions encompassing linear, Radial Basis 
Function (RBF), Sigmoid, and polynomial 
could be employed (Chandrasekaran, 
Panigrahi, Ravikanth, & Singh, 2019). It is 
necessary for the correct selection of functions 
since the type of kernel function directly 
impacts the model’s performance and outcomes 
(Kazemi, Mahmoudi, & Khojastehnazhand, 
2023). 


ANN Model 

Recently, Artificial Neural Network (ANN) 
elucidated from human brain function has been 
one of the most commonly used modeling 
technique for classification. The functioning of 
Neural Networks relies on input, hidden, and 
output layers, each containing varying numbers 
of neurons. Neurons have a weight assigned to 
them based on the model’s training and serve as 
storage for the model’s inputs and calculation 
layers. Randomly assigning weights to neurons 
sets the foundation for training an ANN model. 
The present study employed a feed-forward 
network, a type of neural network 
methodology, where 70% of data was initially 
used for training purposes and the remaining 
30% for testing purposes. 


Regression Modeling 

After classification of samples, the 
prediction of adulterated level was done by 
using partial least squares regression (PLSR). 
The utilization of PLSR helps to enhance the 
interconnection between spectral data and the 
features that need to be quantified. By 
distinguishing between X and Y variables, 
PLSR defines a set of new features named latent 
variables, which are characterized as 
orthogonal and linear combinations of X 
variables (Peng, Cheng, Wang, & Zhu, 2020). 
In present study, the PLSR model was applied 
to the FI-MIR spectra to investigate the 
possibility to predict the percentage of sodium 
hydrosulfite adulteration in wheat flour. The 
reliability of the acquired predicted model is 
explored by using external validation data. 70% 
of the dataset were used to build calibration 


model and 30% of the dataset was used for 
testing the created model. 

The assessment of acquired models are done 
by sensitivity and specificity according to 
equations | and 2 (Kazemi, Mahmoudi, Veladi, 
Javanmard, & Khojastehnazhand, 2022): 


* 100 (1) 


Sensitivity (7) = = 


Specificity (%) = * 100 (2) 
TN+FP 

Where TP (True Positive) is the number of 
samples belonging to either pure flour correctly 
classified as pure samples; FP (False Positive) 
is the number of mixed samples wrongly 
classified as pure samples; TN (True Negative) 
is the number of mixed samples correctly 
classified as mixed; FN (False Negative) is 
those pure samples classified as mixed. These 
two statistical parameters take values between 
0 and 1. The higher their values, the better the 
classification performance of models.in the 
regression modeling, Root Mean Square Error 
(RMSE) of calibration (RMSEC), prediction 
(RMSEP), and coefficient of determination 
(R’*) values are important parameters which 
evaluate the predictive power of a PLS 
calibration model. Higher predictive power is 
represented with higher R* and lower RMSE 
(Pebriana, Rohman, Lukitaningsih, & Sudjadi, 
2017; Rohman & Salamah, 2018). For PLS 
calibration models developed to predict the 
amount of adulteration in adulterated flour, 
RMSEC and RMSEP can be calculated using 
equations 3 and 4, where Yj and Y, are the actual 
and predicted values of an adulterated samples, 
respectively. M and N are the number of data in 
calibration and prediction set, respectively 
(Sikorska, Khmelinskii, & Sikorski, 2014). 


RMSEC = [@icvo? (3) 
M-1 


LP i-Yi)? 


RMSEP = ~ (4) 
cip. (5) 
TP+FN+TN+FN 


Results and Discussion 
Spectra Interpretation 
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FT-MIR spectra of pure wheat flour, sodium 
hydrosulfite, and adulterated samples with 
different adulteration levels is displayed in Fig. 
2. Due to some peaks overlaps, chemometric 
tools is necessary to extract information. In 
most of peaks almost all of the classes showed 
similar peaks except Blankit. We had major 
peaks at 1050 cm", 1730 cm'!, 2950 cm, and 
3400 cm'!. But the wavelength of pure Blankit 
was different and except some peaks like 1050 
cm', in the majority of peaks of other 
adulterated classes, it did not have peaks and 
also in some cases like 1950 cm’! and 2050 cm 
‘it showed peaks but other classes did not have. 
The basic bands at 2800-3040 cm’! are related 
to C-H and C-H2 symmetric and asymmetric 
stretching and mainly attributed to band 
vibrations of the lipids in the flours (Roa, 
Santagapita, Buera, & Tolaba, 2014). The 


1.2 

1 
0.8 
0.6 
0.4 
0.2 
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0 1000 2000 3000 
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bands with the maximum at 1640 cm’ are 
associated to protein band vibrations (Guzman- 
Ortiz et al., 2015). Furthermore, spectra show a 
strong absorption band, from 900-1200 cm’! 
and C-H bending (1000 cm”), mainly related to 
carbohydrates (Rodriguez, Rolandelli, & 
Buera, 2019). 


PCA Model 

PCA model as unsupervised modeling was 
applied to dataset to decrease the dimensions of 
data as preserving the original variables. The 
acquired FT-MIR data was processed by PCA 
model to explore the probable similarities and 
differences among pure and adulterated flour 
samples. With the comparison of different 
applied preprocessing techniques, the result of 
PCA model with (S-G + D1 + SNV) was the 
best. 


pure flour 

—— Blankit 
flour+10% Blankit 
flour+15%Blankit 
flour+20%Blankit 


flour+25%Blankit 


4000 5000 


Fig. 2. The acquired FT-MIR spectra for flour samples 


The obtained score plot of first two PCs 
(PC1=88% and PC2 = 4%) is shown in fig.3. 
this figure displays that, all the pure samples 
were projected on PC1 negative values. Thus, 
PC1 provided a fairly discrimination between 
pure and adulterated samples. As it is 
observable from Fig. 3, the pure flour samples 
were gathered and separated well from the 
adulterated samples. Due to high chemical 
composition difference of pure Sodium 
hydrosulfite, the hydrosulfite samples were 
well-separated and were located on the other 
side of PCl. Because of similarities of the 


chemical composition of adulterated samples, 
there were some misclassifications between 
different adulterated level groups. Similarities 
of compositional structure of samples with 
different adulteration levels can be a reason for 
misclassification of adulterated samples. 
Mishra et al. applied PCA model combined 
with hyperspectral imaging method to detect 
peanut traces in wheat flour with the 
presentation of 99.43% of variance, pure and 
adulterated samples were well-distinguished 
along PC1 similar to the present study (Mishra 
et al., 2015). In addition, the results of PCA 
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model in the present study was in agreement 
with the result of PCA model for discrimination 
of wheat flour with other cereal flours (barley, 


represented good discrimination of barley flour 
samples from wheat flour. However, one type 
of wheat flour was located very close to other 
flours (Nur Arslan, 2020). 


15 
PC-1 (88%) 


20 25 30 35 40 45 


sodium hydrosulfite 
flour + 15%sodium hydrosulfite 
flour + 25%sodium hydrosulfite 


Fig. 3. The score plot of PCA model 
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SVM and ANN Models 


Table | represents the accuracy of SVM and 
ANN models as_ supervised classification 
methods after applying various kinds of 
preprocessing methods and combination of 
them for training datasets. The SVM model was 
implemented in four different kernel functions 
(linear, polynomial, Radial Basis Function 
(RBF), and Sigmoids). In both models, 70% of 
data was randomly assigned to model training 
and other 30% were used for model testing. In 
addition, 5% of 70% neural network model 
training data was used to validate them. As 


Input 
1886 | 


depicted in Table 1, the accuracy of SVM 
model with linear kernel function with S-G + 
D2 + SNV preprocessing was 86.66% and also 
86.70% for ANN models. Based on the applied 
preprocessing methods, the ANN model also 
yielded acceptable results based on the optimal 
neural network structure shown in Fig. 4. 

Fig. 5 represents SVM graphical model after 
employing S-G + D2 + SNV preprocessing. 

As shown in Table 1, linear kernel had better 
results for all preprocessing techniques. 


Fig. 4. The structure of the optimal artificial neural network 
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Classification 


Fig. 5. SVM graphical model for classification of samples 


Table 1- The accuracy of SVM and ANN models 


Model SVM ANN 
hen Linear Polynomial RBF Sigmoid - 
unction 
Preprocess Trai Val Tes’ Trai Val Tes’ Trai Val Tes Trai Val Te Trai Va_ Te 
ing n t n t n t n st n ] st 
78.7 68.7. 75.5 12.5 11.2 13.7 18.7 
S-G 5 5 5 0 5 6.66 10 5 6.66 17.5 5 20 88.9 80 80 
S-G +D1 20 17.5 20 20 20 20 20 20 20 20 20 20 844 80 i 
S-G+ D1 81.2 84.4 71 43.7 51.1 2.2 10 
+SNV 100 5 i 67.5 50 1 52.5 5 1 1.25 10 > 98.9 0 80 
SG+D2 20 2 20 2 2 2 20 20 2 2 20 20 «50 60 os 
S-G + D2 81.2 86.6 68.7 53.7 68.8 61.2 55.5 23.7 4.4 10—s 86. 
+SNV me 5 6 5 5 8 5 = 5 i 5 4 ee 0 7 
S-G + D2 10 
+ MSC 20 20 20 20 20 20 20 20 20 20 20 20 98.9 0 80 
93.7 28.7 26.6 8.8 86. 73. 
S-G + SNV 5 70 80 30 30 30 30 5 6 10 12.5 8 85.6 4 3 
Linear kernels work well when the be concluded that the structure and nature of 


underlying relationship between the input 
features and the target variable is approximately 
linear. The better performance of linear kernel 
maybe due to the nature of the dataset, which is 
separated or modeled effectively by linear 
boundaries. Furthermore, in high-dimensional 
spaces, like spectroscopic data, linear kernels 
can perform better than more complex kernels. 
This is because complex kernels can exacerbate 
the curse of dimensionality making it harder to 
find a suitable decision boundary. In addition, 
according to the better outcomes of polynomial 
kernel in comparison with other kernels, it can 


dataset tends to simple and linear form. In 
another study, Yuan et al, employed NIR 
spectroscopy to detect Sodium 
hydroxymethanesulfonate in wheat flour. Three 
algorithms including PLS-DA, advanced K- 
means dynamic clustering, and LS-SVM were 
used to establish the calibration models. The 
outcomes of LS-SVM outperformed other two 
methods, with the classification accuracy of 
94.70% for the prediction (Yuan, Xiang, Yu, & 
Xu, 2011). However, the outcomes of SVM 
model in the mentioned research was better 
than our present study but this point should be 
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mentioned that, the applied SVM model in that 
study was for classification of two classes but 
86.66% result of present study was for 
classification of five classes. In spite of the fact 
that FTIR spectroscopy combined with 
chemometric methods confirmed its application 
to detect the adulteration of Sodium 
hydrosulfite in wheat flour, but there were some 
limitations in the present study which we hope 
to be solved in the future studies. The 
environmental effects like moisture are 
different from the bakeries or industrial places. 
Although, due to the fact that the moisture was 
similar in all samples, this issue was solved. But 
for application of this study method in other 
situations, definitely the conditions of system 
should be calibrated again. Besides, the applied 
technique in the present study can be studied for 


detection of other adulterants in wheat flour 
simultaneously. In order to assess the 
classification ability of each class in SVM 
model, the confusion matrix was investigated 
for test dataset (Table 2). The results were 
assessed by calculation of _ sensitivity, 
specificity, and accuracy statistical parameters. 
As it was expected, the highest classification 
accuracy was for class 1 (pure wheat flour). 
Also, the accuracy of class C (adulterated with 
15% level) was 100%. The weakest 
classification result was for class D (20% 
adulterated). In class E (25% adulterated) 6 
samples were classified correctly, and 3 
samples were classified for class D. The 
difference of these classes is 5% adulteration 
level. Then, the classification result of this class 
was acceptable too. 


Table 2- The confusion matrix of SVM model for test dataset 


A 


E 
Sensitivity 


9 

0 

0 

D 0 
0 

1 
Specificity 1 
0 


CCR (%) 100 


B C D E 

0 0 0 0 

9 0 0 0 

1 8 0 0 

1 0 7 1 

0 0 3 6 

1 0.88 0.77 0.66 
0.94 1 0.91 0.97 


81 100 70 85 


PLSR Model 

In order to quantify the adulterant in wheat 
flour, FT-MIR spectroscopy-based regression 
model (PLSR) was built. This regression 
model is based on developing algebraic 
correlation between the quantity of 
adulteration in wheat flour samples and 


absorption of sample along different 
wavelengths. The ideal calibration model was 
determined based on lowest RMSEC, 
RMSECV, and highest R.” and Rey”. The value 
of mentioned statistical parameters as well as 
the number of “Latent Variables” (LVs) are 
presented in Table 3. 


Table 3- The results of PLSR model in predicting the adulteration level using different preprocessing methods 


Preprocessing LV Calibration Test 

R? RMSE R?~ RMSE 

SG 7 0.995 0.118 0.994 0.123 

S-G+D1 7 1.00 2.46 1.00 2.39 

S-G+D1+SNV 7 0.987 0.36 0.979 0.82 

S-G + D2 7 1.00 3.94 1.00 2.87 
SG+D2+SNV 7 0.975 0.265 0.967 0.312 

SG+D2+MSC_ 7 1.00 2.87 1.00 2.50 

S-G+ SNV 7 0.992 0.15 0.989 0.18 
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As shown in Table 3, the best PLS model 
was obtained with the preprocessing method of 
S-G using seven LVs, which showed the 
prediction performance (Rey = 0.994 and 
RMSECV = 0.123). The similarity of train and 
test results represent of model’s good ability 
for prediction of precise levels of adulteration. 
The performance of PLSR was also externally 


validated by using the test set of samples, as 
shown in Fig. 6. The PLS prediction plot 
illustrates that PLSR model displayed a very 
good prediction ability (Rp’ = 0.992). Fig. 6 
shows the relationship between reference data 
and predicted values obtained in_ the 
laboratory. 


PLSR 


y = 0.9797x + 0.1073 
R?= 0.9926 


Predicted 
OrRFN WW fu DN 


4 5 6 7 


Reference 


Fig. 6. The performance of PLSR model for prediction of adulteration levels 


Recently, Martins et al, predicted the 
presence of whey protein in wheat flour by FT- 
IR spectroscopy and multivariate analysis. The 
PLSR model was applied to the acquired 
spectra and the best model of obtained spectra 
had Rea” = 0.99, Rpre”? = 0.98, RMSEC = 3.5, 
and RMSEP = 3.00 (Martins et al., 2022). 
However, the R? results were in agreement with 
this research, but RMSE results were weaker. 
In other research, Nur Arslan, applied PLSR 
model to explore the amount of barely flour in 
wheat flour. The statistical parameters of this 
study were close to the results of present study 
(R’ values were at least 0.994 and RMSECV 
result was in the range 0.36-1.50%) (Arslan et 
al., 2020). In another study, the prediction of 
Azodicarbonamide in wheat flour by 
visible/near-infrared spectroscopy was 
investigated by Che eft al. By comparing 3 
applied models in this research (PLSR, Back 
Propagation Neural Network, and Radial Basis 
Function), Radial Basis Function model had the 
best prediction results with Correlation 
Coefficient @®, RMSEP 0.99 and 0.54, 


respectively (Che ef al., 2017), and were in 
agreement with the outcomes of present study. 


Conclusion 

The presence of Sodium hydrosulfite 
(Blankit) in wheat flour was investigated by 
FT-MIR spectroscopy. PCA as unsupervised 
and SVM and ANN as supervised models were 
applied to detect the adulteration and PLSR 
model as regression model was applied to 
quantify the amount of adulteration. The 
mentioned chemometric models were built after 
some preprocessing techniques. The acquired 
results for detection and quantification of 
Sodium Hydrosulfite proved that FT-IR 
spectroscopy can be a reliable method to detect 
and quantify Sodium hydrosulfite in wheat 
flour. 
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